Learn R Programming

qeML (version 1.1)

R Factor Utilities: R Factor Utilities

Description

Utilities to manipulate R factors, extending the ones in regtools.

Usage

levelCounts(data)
dataToTopLevels(data,lowCountThresholds)
factorToTopLevels(f,lowCountThresh=0)
cartesianFactor(dataName,factorNames,fNameSep = ".")
qeRareLevels(x, yName, yesYVal = NULL)

Arguments

data

A data frame or equivalent.

f

An R factor.

lowCountThresh

Factor levels will counts below this value will not be used for this factor.

lowCountThresholds

An R list of column names and their corresponding values of lowCountThresh.

dataName

A quoted name of a data frame or equivalent.

factorNames

A vector of R factor names.

fNameSep

A character to be used as a delimiter in the names of the levels of the output factor.

x

A data frame.

yName

Quoted name of the response variable.

yesYVal

In the case of binary Y, the factor level to be considered positive.

Author

Norm Matloff

Details

Often one has an R factor in which one or more levels are rare in the data. This could cause problems, say in performing cross-validation; a level in the test set might be "new," not having appeared in the training set. Toward this end, factorToTopLevels will remove rare levels from a factor; dataToTopLevels applies this to an entire data frame.

Also toward this end, the function levelCounts simply applies table() to each column of data, returning the result as an R list. (If more than 10 levels, it returns NA.

The function cartesianFactor generates a "superfactor" from individual ones; e.g. if factors f1 and f2 have n1 and n2 levels, the output is a new factor with n1 * n2 levels.

The function qeRareLevels checks all columns in a data frame in terms of being an R factor with rare levels.

Examples

Run this code

data(svcensus)
levelCounts(svcensus)  # e.g. finds there are 15182 men, 4908 women
f1 <- svcensus$gender  # 2 levels
f2 <- svcensus$occ  # 6 levels
z <- cartesianFactor('svcensus',c('gender','occ'))
head(z)
# [1] female.102 male.101   female.102 male.100   female.100 male.100  
# 12 Levels: female.100 female.101 female.102 female.106 ... male.141

Run the code above in your browser using DataLab